Search CORE

16 research outputs found

Temporal Sentence Grounding in Streaming Videos

Author: Gan Tian
Guo Qingpei
Nie Liqiang
Sun Yan
Wang Xiao
Wu Jianlong
Publication venue
Publication date: 14/08/2023
Field of study

This paper aims to tackle a novel task - Temporal Sentence Grounding in Streaming Videos (TSGSV). The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query. Unlike regular videos, streaming videos are acquired continuously from a particular source, and are always desired to be processed on-the-fly in many applications such as surveillance and live-stream analysis. Thus, TSGSV is challenging since it requires the model to infer without future frames and process long historical frames effectively, which is untouched in the early methods. To specifically address the above challenges, we propose two novel methods: (1) a TwinNet structure that enables the model to learn about upcoming events; and (2) a language-guided feature compressor that eliminates redundant visual frames and reinforces the frames that are relevant to the query. We conduct extensive experiments using ActivityNet Captions, TACoS, and MAD datasets. The results demonstrate the superiority of our proposed methods. A systematic ablation study also confirms their effectiveness.Comment: Accepted by ACM MM 202

arXiv.org e-Print Archive

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Author: Chen Yutao
Dong Xingning
Gan Tian
Guo Qingpei
Yang Ming
Zhou Chunluan
Publication venue
Publication date: 21/08/2023
Field of study

Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high fine-tuning cost and the limited generation capacity. Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing. Towards this end, we propose EVE, a robust and efficient zero-shot video editing method. Under the guidance of depth maps and temporal consistency constraints, EVE derives satisfactory video editing results with an affordable computational and time cost. Moreover, recognizing the absence of a publicly available video editing dataset for fair comparisons, we construct a new benchmark ZVE-50 dataset. Through comprehensive experimentation, we validate that EVE could achieve a satisfactory trade-off between performance and efficiency. We will release our dataset and codebase to facilitate future researchers

arXiv.org e-Print Archive

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

Author: Cheng Yuan
Chu Wei
Guo Qingpei
Jiang Chen
Liu Hong
Liu Zhongyi
Qi Yuan
Wang Qing
Xu Jia
Yang Ming
Yu Xuzheng
Publication venue
Publication date: 20/09/2023
Field of study

In recent years, the explosion of web videos makes text-video retrieval increasingly essential and popular for video filtering, recommendation, and search. Text-video retrieval aims to rank relevant text/video higher than irrelevant ones. The core of this task is to precisely measure the cross-modal similarity between texts and videos. Recently, contrastive learning methods have shown promising results for text-video retrieval, most of which focus on the construction of positive and negative pairs to learn text and video representations. Nevertheless, they do not pay enough attention to hard negative pairs and lack the ability to model different levels of semantic similarity. To address these two issues, this paper improves contrastive learning using two novel techniques. First, to exploit hard examples for robust discriminative power, we propose a novel Dual-Modal Attention-Enhanced Module (DMAE) to mine hard negative pairs from textual and visual clues. By further introducing a Negative-aware InfoNCE (NegNCE) loss, we are able to adaptively identify all these hard negatives and explicitly highlight their impacts in the training loss. Second, our work argues that triplet samples can better model fine-grained semantic similarity compared to pairwise samples. We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs. The proposed TPM-CL designs an adaptive token masking strategy with cross-modal interaction to model subtle semantic differences. Extensive experiments demonstrate that the proposed approach outperforms existing methods on four widely-used text-video retrieval datasets, including MSR-VTT, MSVD, DiDeMo and ActivityNet.Comment: Accepted by ACM MM 202

arXiv.org e-Print Archive

The WRKY Transcription Factor WRKY71/EXB1 Controls Shoot Branching by Transcriptionally Regulating RAX

Author: Baoye Wei
Boxun Li
Dongshu Guo
Genji Qin
Hao Yu
Hongya Gu
Jianqiao Wang
Jinzhe Zhang
Li-Jia Qu
Qingpei Huang
Xiang Han
Xinlei Wang
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date
Field of study

Crossref

Design and Risk Evaluation of Reliability Demonstration Test for Hierarchical Systems With Multilevel Information Aggregation

Author: Huairui Guo
Jian Liu
Mingyang Li
Qingpei Hu
Weidong Zhang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Automatic Car Damage Assessment System: Reading and Understanding Videos as Professional Insurance Inspectors

Author: Cheng Yuan
Chu Wei
Guo Qingpei
Guo Xin
Jiang Chen
Wang Jian
Wang Meng
Wang Qing
Xu Furong
Zhang Wei
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 03/04/2020
Field of study

We demonstrate a car damage assessment system in car insurance field based on artificial intelligence techniques, which can exempt insurance inspectors from checking cars on site and help people without professional knowledge to evaluate car damages when accidents happen. Unlike existing approaches, we utilize videos instead of photos to interact with users to make the whole procedure as simple as possible. We adopt object and video detection and segmentation techniques in computer vision, and take advantage of multiple frames extracted from videos to achieve high damage recognition accuracy. The system uploads video streams captured by mobile devices, recognizes car damage on the cloud asynchronously and then returns damaged components and repair costs to users. The system evaluates car damages and returns results automatically and effectively in seconds, which reduces laboratory costs and decreases insurance claim time significantly

Association for the Advancement of Artificial Intelligence: AAAI Publications

Sperm cells are passive cargo of the pollen tube in plant fertilization

Author: Bleckmann Andrea
Dong Juan
Dresselhaus Thomas
Gu Hongya
Guo Xinyang
Huang Jiaying
Huang Qingpei
Lin Qing
Qu Li-Jia
Zhang Jun
Zhong Sheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Sperm cells of seed plants have lost their motility and are transported by the vegetative pollen tube cell for fertilization, but the extent to which they regulate their own transportation is a long-standing debate. Here we show that Arabidopsis lacking two bHLH transcription factors produces pollen without sperm cells. This abnormal pollen mostly behaves like the wild type and demonstrates that sperm cells are dispensable for normal pollen tube development

University of Regensburg Publication Server

Maternal ENODLs Are Required for Pollen Tube Reception in Arabidopsis

Author: Bleckmann Andrea
Cai Le
Cyprys Philipp
Dong Juan
Dresselhaus Thomas
Gu Hongya
Guo Xinyang
Hou Yingnan
Huang Qingpei
Luo Yu
Qu Li-Jia
Zhang Ying
Publication venue: CURRENT BIOLOGY
Publication date: 01/01/2016
Field of study

During the angiosperm (flowering-plant) life cycle, double fertilization represents the hallmark between diploid and haploid generations [1]. The success of double fertilization largely depends on compatible communication between the male gametophyte (pollen tube) and the maternal tissues of the flower, culminating in precise pollen tube guidance to the female gametophyte (embryo sac) and its rupture to release sperm cells. Several important factors involved in the pollen tube reception have been identified recently [2-6], but the underlying signaling pathways are far from being understood. Here, we report that a group of female-specific small proteins, early nodulin-like proteins (ENODLs, or ENs), are required for pollen tube reception. ENs are featured with a plastocyanin-like (PCNL) domain, an arabinogalactan (AG) glycomodule, and a predicted glycosylphosphatidylinositol (GPI) anchor motif. We show that ENs are asymmetrically distributed at the plasma membrane of the synergid cells and accumulate at the filiform apparatus, where arriving pollen tubes communicate with the embryo sac. EN14 strongly and specifically interacts with the extracellular domain of the receptor-like kinase FERONIA, localized at the synergid cell surface and known to critically control pollen tube reception [6]. Wild-type pollen tubes failed to arrest growth and to rupture after entering the ovules of quintuple loss-of-function EN mutants, indicating a central role of ENs in male-female communication and pollen tube reception. Moreover, overexpression of EN15 by the endogenous promoter caused disturbed pollen tube guidance and reduced fertility. These data suggest that female-derived GPI-anchored ENODLs play an essential role in male-female communication and fertilization.Natural Science Foundation of China [31230006, 31370344]; National Basic Research Program of China [2012CB944801]; German Research Council (DFG Collaborative Research Center) [SFB924]SCI(E)[email protected]

University of Regensburg Publication Server